Minimal entropy approximation for cellular automata 



Henryk Fuks 



Department of Mathematics 
Brock University 
St. Catharines, Ontario L2S 3Al, 
Canada 
Email: hf uksObrocku . ca 



Abstract 



We present a method for construction of approximate orbits of measures under the 
action of cellular automata which is complementary to the local structure theory. The 
local structure theory is based on the idea of Bayesian extension, that is, construction of 
a probability measure consistent with given block probabilities and maximizing entropy. 
If instead of maximizing entropy one minimizes it, one can develop another method for 
construction of approximate orbits, at the heart of which is the iteration of finitely- 
dimensional maps, called minimal entropy maps. We present numerical evidence that 
minimal entropy approximation sometimes spectacularly outperforms the local struc- 
ture theory in characterizing properties of cellular automata. Density response curve 
for elementary CA rule 26 is used to illustrate this claim. 

1. Introduction 

Let ^ = {0, 1, . . . , — 1} be called an alphabet, or a symbol set, and let X = A^. Elements 
of X will be called configurations. A finite sequence of elements of b = 6162 ■ ■ ■ ,bn will be 
called a block (or word) of length n. Set of all blocks of elements of A of all possible lengths 
will be denoted by A*'. Cylinder set generated by the block b = 6162 ■ ■ - bn and anchored at 
i is defined as 



The appropriate mathematical description of a distribution of configurations is a proba- 
bility measure on X. Cellular automata (CA) are often considered as maps in the space of 
such probability measures P, |2l [3l H] . 

In this paper, we will be interested in shift-invariant probability measures over X, or more 
precisely, in shift-invariant probability measures on the cr-algebra generated by elementary 
cylinder sets of X. Set of such measures will be denoted by VJlrj{X). Detailed construction 



[h]i = {xeA^: X[i,i+„) = b}. 




of measures in ^fftcr{X) is described in the review article [5], and interested reader is advised 
to consult this reference. Nevertheless, it is not necessary to be familiar with the details of 
the construction in order to follow the present paper. 

The most important feature of any measure /i G dJt„{X) is that it is fully determined by 
measures of cylinder sets yu([a]j) for all a E A*, which we will denote by 

P(a)=M[a].). (2) 

Note that -P(a), which we will call block probability, is independent of i due to shift-invariance 
of the measure fi. Block probability P(a) can be intuitively understood as the probability 
of occurrence of a given block a in the distribution of configurations. 

The following theorem formally states the connection between block probabilities and 
measures. It is a direct consequence of Hahn-Kolmogorov extension theorem. For proof the 
reader can consult [5J and references therein. 

Theorem 1.1 Let P : A* [0, 1] satisfy the conditions 

P(h) = J2 ^(ba) = Yl ^(^^) ^ 

aeA a&Q 

l = Y.P{a). (4) 

Then P uniquely determines shift-invariant probability measure on the a-algebra generated 
by elementary cylinder sets of X. 

The conditions ([s]) and (|4]) are known in the literature as consistency conditions. 

Although the set of all block probabilities {-P(a) : a G A*} is countable, it is still infinite, 
and in many practical problems, such as computer simulations, it is often possible to keep 
track of only a finite number of block probabilities. This brings an important question: if 
we know probabilities of all blocks of a given length, can we reconstruct the entire measure 
approximately! One answer to this question is well known and called "Bayesian extension", 
originally introduced in the context of lattice gases [S',7]. The approximate measure produced 
by the Bayesian extension is known as a "finite-block measure" or as "Markov process with 
memory". The aforementioned review paper [5j discusses details of the Bayesian extension. 
The main feature of this extension is that, given a finite set of probabilities, it constructs all 
other block probabilities, satisfying consistency conditions, such that the resulting measure 
has the maximal entropy. 

The Bayesian extension proved to be a very useful device in statistical physics as well as in 
the theory of cellular automata. In 1987, H. A. Gutowitz, J. D. Victor, and B. W. Knight [8J 
proposed a generalization of the mean-field theory for cellular automata based on the idea of 
Bayesian extension. They called it local structure theory. The local structure theory, recently 
formalized and extended [S], turned out to be a very powerful tool for characterization of 
cellular automata. 

Given the success of the local structure theory, which is based on the maximal entropy 
approximation, it seems quite natural to ask how useful the complementary approximation 
would be, namely the one which minimizes the entropy instead of maximizing it? To the 
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knowledge of the author, no one has ever pursued this idea, and this article is intended to 
partially fill this gap. 

In what follows, we will investigate the minimal entropy approximation in a configuration 
space over a binary alphabet, that is, assuming A = {0, 1}. Although ideas presented below 
can be easily carried over to alphabets of higher cardinality, the binary case is the simplest 
and the most elegant one, and that is the only reason why we restrict our attention to 
^={0,1}. 

2. Minimal entropy extension 

Before we proceed, let us define P*^^) to be the column vector of all probabilities of blocks of 
length k arranged in lexical order. For example, for A = {0, 1}, the first three vectors P^'^-' 
are 

p(i) = [P(0),P(l)f, 

P(2) = [P(00),P(01),P(10),P(11)]^, 

p(3) = [P(OOO), P(OOl), P(OIO), P(Oll), P(IOO), P(lOl), P(llO), P(lll)]^, 

Entropy of P*^'^^ will be defined as 

/,(pW)=_^p(b)logP(b). (5) 

Suppose that for a given probability measure we know all block probabilities P^^\ P^^\ . . . , P*^"^ 
We want to construct block probabilities p("+^) which minimize entropy h(P^^'^^^^ and are 
consistent with block probabilities P^^\P^'^\ . . . , P^"\ 

In order to do this, we fist must remark that not all block probabilities which are elements 
of vectors P'-^^ P^^\ . . . , P*-") are independent, due to consistency conditions. In [5j, we 
demonstrated that for A = {0, 1}, only 2"~^ block probabilities are independent. Which 
ones are declared to be independent, and which ones are treated as dependent, is to some 
extent arbitrary. One choice of independent probabilities is called short form representation 
[5j. For the binary alphabet, in the short form representation block probabilities which have 
the form P(OaO) are declared to be independent, and the remaining ones are treated as 
dependent. For example, among elements of P^^\ P'^^^ P^^^ the independent probabilities 
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are -P(O), -P(OO), P(OOO), P(OIO). The remaining ones can be expressed as 



-T (^UUl ) 




P(00) - P(OOO) 






P(0) - P(00) - P(OIO) 


JT I ±UU J 




P(00) - P(OOO) 


P(lOl) 




P(0) - 2P(00) + P(OOO) 


P(llO) 




P(0) - P(00) - P(OIO) 






1 - 3P(0) + 2P(00) + P(OIO) 


■ ^(01) " 




P(0) - P(00) 




P(10) 




P(0) - P(00) 




P(ll) 




1 - 2P(0) + P(00) 




P(l) = 1 


-P(0). 





(6) 

Coming back to our problem, if we want to construct p("'+i) given P'^^\P^'^\ . . . ,P*^"\ 
we are free to choose only the values of elements of p("+i) which are of the form P(OaO), 
where a e A"~^. These probabilities will be denoted by x^, and the remaining ones can be 
expressed in terms of Xa and probabilities of shorter blocks, 

P(OaO) = Xa, 

P(Oal) = P(Oa) - Xa, 

P(laO) = P(aO) - P(OaO) = P(aO) - x^, 

P(lal) = P(al) - P(Oal) = P(al) - (P(Oa) - Xa). (7) 

The problem is now as follows: how to choose parameters Xa in order to minimize entropy 
/i(P*^"+^))? The following theorem provides the answer. 

Theorem 2.1 Suppose that /i is a shift-invariant probability measure, and P(b) = //([b]i). 
Let 



P(OaO) 






P(Oal) 


= P(Oa) 


Xa, 


P(laO) 


= P(aO) 




P(lal) 


= P(al) 


- (P(Oa) - X 



(8) 



where 



Then 



'max{0,P(Oa) -P(al)} z/ |P(al) - P(Oa)| < |P(aO) - P(Oa)|, 
mm {P(Oa),P(aO)} otherwise. 

- ^(b)logP(b)<- Yl ^(b)logP(b). 



(9) 



(10) 
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Proof. Let us first notice that P(Oa), P(aO), and -P(al) are not independent. Consistency 
conditions imply that 

P(Oa) < P(al) + P(aO) < 1, (11) 

and from there we obtain 

P(Oa) - P(aO) < P(al) < 1 - P(aO). (12) 
Denoting aa = P(Oa), /3a = P(aO), 6^ = P(al), this can be written as 

aa - /3a < 5a < 1 - /3a. (13) 

The right hand side of inequality ( [l0| ) can be written as 

- J2 miogP(b)= (14) 

- P(OaO)logP(OaO) + P(Oal)logP(Oal) + P(laO)logP(laO) + P(lal)logP(lal). 



Using eqs. ([T]) this becomes 



J2 P(b)logP(b)= Ha{xa). (15) 



where we define 



Ha{Xe) = - X^logXa - {a^ - X^) log (fta - X^) 

-(/3a - a;a) log(^a " ^^a) " (^a " ("a " X^)) log (5a " (c^a " X;^)) . (16) 

Function Ha{xa) is concave, and Xa can only take values from some closed interval [Xa,i, Xa,2\- 
For this reason, Ha{xa) reaches minimum at one of the endpoints of the interval. We will 
show that the minimum occurs precisely at = x^, where Xa is defined in eq. (g. First, let 
us determine the values of the endpoints a;a,i,Xa,2- In order to do this, note that obviously 
Xa G [0, 1]- By consistency conditions, 

P(al) = P(Oal) + P(lal), (17) 

P(aO) = P(OaO) + P(laO), (18) 

P(Oa) = P(OaO) + P(Oal), (19) 

and therefore 

P(al) > P(Oal), (20) 

P(aO) > P(OaO), (21) 

P(Oa) > P(OaO). (22) 

Using eqs. ([T]) and the notation Oa = P(Oa), /3a = P(aO), 5a = P(al), this becomes 

5a > "a - Xa, (23) 
/3a > Xa, (24) 
"a > Xa- (25) 
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Solving the above system of inequalities for we obtain 

6a< Xa< min {a^, /3a} • 



(26) 



Since Xa e [0, 1] , we obtain the following expression for the endpoints of the interval 

Xa,! = max{0,aa - ^a} , a:a,2 = min {eta, /3a} ■ (27) 

Suppose now that we fix both a and a^- Let us consider separately the four cases 
described in the table below. 





/3a < "a 


/3a > "a 


5a 


< Q!a 


Xa,l 5a 


Xa,l Ola 5a 






Xa,2 = /3a 


Xa,2 = Ola 


5a 


> tta 


Xa,l = 


Xa,l = 






Xa,2 = j3a 


Xa,2 = Ola 



We will determine the sign of Ha{xa,i) — Ha{xa,2)- If -f^a(a^a,i) — Ha{xa,2) < 0, then the 
minimum of Ha occurs at Xa,i, otherwise at Xa,2- To avoid notational clutter, we will drop 
the index a from aa, (3a, 5a. 

Case 1: /3 < a, S < a. 
We have 

^a(2:a,i) - ^a(2:a,2) = ^ log ^5 + (« - /3) log(a - /3) - (o; - 5) log(a - 5) - 5 log 5. (28) 
Defining fp{x) = a; log a; + {p — x) log(p — x) we can write 

Ha{Xa,l) - Ha{Xa,2) = /a(/3) - /a(5). (29) 

The function fa{x), defined on interval x G (0, a), reaches minimum at a; = q;/2, and has the 
property /a (a;) = fa{ci—x). This means that /a(5) > fa{/3), and thus ifa(a^a,i)—-f^a(a;a,2) < 0, 
if and only if 



5- 



a 



> 



/3- 



a 



(30) 



Case 2: /3 < a, S > a. 
We have 



Ha{Xa,l) - Ha{Xa,2) = -« log - (5 - a) log(5 - «) 

+{a - /3) log(a - /3) + (/3 - a + 5) log(/3 - a + 5) = 
/^(q; - /3) - fs{a). 

For the same reason as before, Ha{xa,i) — Ha{xa,2) < if and only if 



(31) 



> 



a 



5 
2 



(32) 
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Case 3: (3 > a, 8 < a. 
We have 

H^ix^^i) - H^{x^^2) = -(a - 5) log(a - 5) (33) 
— {j3 — a + 6)\og{j3 — a + 6) + aloga + {l3 — a) log(/3 — a) 

Again, by the property of discussed under Case 1, Hg,{xs, i) — Ha{xa,2) < is equivalent 
to 



a — 5 — 



> 



a — 



/3 



(34) 



Case 4: (3 > a, 5 > a. 
We have 

H^ix^^i) - H^{x^^2) = -/31og/3 -{5 -a) log(5 - a) 
+ {(3 - a)log{(3 - a) + SlogS 

where g(y) = y\ogy — {y — a) log(y — a). Since for y G (0, a) 

9\y) = log?/ - \og{y - a) > 0, 



(35) 



(36) 



h{y) is increasing in (0,a). This means that g{5) > g{(3), or equivalently Ha{xa,i) — 
Ha{xa.2) < 0, is Satisfied if and only if 

(5 < /3. (37) 



We obtained four inequalities (30), (32), (34), and (37) for four cases. We plotted in 



Figure [T] solutions of these four inequalities in (3-5 space using different colors for each case. 
One can, however, combine all four cases and describe them by one simple inequality taking 
into account the fact that only values of (/3, 5) marked by vertical hashing are possible, due 



to condition (13). This simple inequality combining all four cases (subject to condition (13)) 
is 

\5-a\<\(3-al (38) 

as one can easily verify graphically by comparing Figures [T] and [2] 

To summarize our findings, we demonstrated that the minimum of occurrs at Xa, 
where 

Xa,l, if l^a - "al < |/3a " 

Xa 2 otherwise. 



(39) 



and where Xa,i and Xa,2 are defined in eq. (27). This is precisely eq. ([9]), and Theorem 
then follows directly. □ 
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5 0.5 




5 0.5 




Figure 1: Solutions of inequalities (30), (32), (34) and (37) in (3-S space, shown, respectively, 



in blue, green, red, and yellow color. Region with vertical hatching represents solution of 



inequality (13), and the region with horizontal hatching represents parameters for which the 



minimum of Ha occurs at Xa,i- Two scenarios of are shown, corresponding to a > 0.5 (left) 
and a < 0.5 (right). 



5 0.5 




5 0.5 



1 




Figure 2: Simplification of Figure 1. Solutions of inequalities (30), (32), (34) and (37) have 



been replaced by solution of |(5 — a| < |/3 — a|, shown in cyan color. 
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3. Minimal entropy approximation for measures 



Using Theorem 2.1, we can now construct approximation of a probability measure comple- 
mentary to the Bayesian approximation. Let us define 

T(a,/3,5) = H^^°'V^^ if|^-«|<|/5-«|, ^^^^ 
I min { a , /3 1 otherwise . 



and 



To,o(«,/3,<5) = T(a,/3,5), (41) 

To,i(«,/3,5) = a-T(«,/3,5), (42) 

Ti,o(«,/3,5) = /3-T(a,/3,5), (43) 

Ti,i(a,/3,5) = 5-a + T(a,/3,5). (44) 



Using this notation and eq. (|8]), one can now express probabilities of {k + l)-blocks by 
probabilities of fc-blocks, by writing 

P{aia2 ■ ■ ■ ctfc+i) ^ Tai,afc+i (^P{0a2 • • • ctfc), ^(^2 • • • a^O), P(a2 • • • Ofcl) j . (45) 

The above approximation will be called minimal entropy approximation. We can, of course, 
repeat this process, and approximate (A;+2)-block probabilities by (A; + l)-block probabilities, 
and, by applying the approximation again, (A;+l)-block probabilities by /c-block probabilities. 
For a given integer k, by recursive application of the minimal entropy approximation, any 
block probability of length p > k can be expressed by probabilities of length k. The following 
proposition states this more formally. 

Proposition 3.1 Let fi G dJt^jlX) be a measure with associated block probabilities P : A* — )■ 
[0, 1], P(b) = /i([b]i) for alii eZ and h e A* . For k > 0, define P : A* [0, 1] recursively 
so that 

{P{aia2 ■ ■ . ap) ii p < k, 

Tai.ap \^P{0a2 . . . ap_i), P{a2 . . . ap_iO), P{a2 . . . flp-il) j if p > k. 

Then P determines a shift-invariant probability measure fi^^^ E OJl(j(X), to be called minimal 
entropy approximation of /i of order k. 

The proof that P determines a measure is a direct consequence of the definition of P. 

It is intuitively clear that as the order of the minimal entropy approximations increases, 
the quality of the approximation should increase too. The following proposition formalizes 
this observation. 

Proposition 3.2 The sequence ofk-th order minimal entropy approximations of fi E S[)Tct(X) 
weakly converges to fi as k ^ oo. 
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Proof: Let n > 0, b G A"' and let Pfc(b) = /l('=)([b]o), P(b) = yu([b]o). Since ioi k > n 
Pfc(b) = P(b), we obviously have limfc_>.oo -Pfc(b) = P(b). Since cylinder sets constitute 
convergence determining class for measures in S[)To-(X), convergence of block probabilities is 
equivalent to weak convergence. This leads to the conclusion that Jl^''^ ^ /i. □ 

Measures /i for which Jl^''^ = /i will be called k-th order measures of minimal entropy. 
Set of such measures over X will be denoted by dJl\^^{X). Obviously these measures are 
shift-invariant, Tl'^^iX) C mi^\x). 

4. Orbits of measures under the action of cellular automata 

Let w : A X A^''^^ — )• [0, 1], whose values are denoted by w{a\h) for a G ^, b G A^'^'^^, 
satisfying Xlae^ '^('^1'-') ~ called local transition function of radius r, and its values will 
be called local transition probabilities. Probabilistic cellular automaton with local transition 
function w is a map F : VJtcr{X) — )■ fX)To-(X) defined as 

(F/i)([b]i) = J2 w{ai\h)i2{[8i]i_r) for alH G Z, b G A\ (47) 

ag_4|bl+2r 

where we define 

|a_ 

M;(a|b) = Y w{aj\bjbj+i . . . 6j+2r)- (48) 

When the function w takes values in the set {0, 1}, the corresponding cellular automaton is 
called deterministic CA. 

For any probabilistic measure fi G VJt^{X), we define the orbit of fi under F as 

{i^>}^=o- (49) 

Excluding trivial cases, computing the orbit of a measure under a given CA is very difficult, 
and no general method is known. We will, therefore, propose a method for approximating 
orbits based on the minimal entropy approximation. 

Let us first define the entropy minimizing operator of order k, denoted by \['('^), to be a 
map from ^fftfj{X) to 2Jt^g(X) such that 

^W^ = /rW, (50) 



where /i*^'^^ is the measure defined in Proposition 3A_ Note that the operator \E'(^^ is indem- 
pontent, that is, \|/(^)v]/('=)^ = vl/C^)^. This allows us to construct approximate orbit of a 
measure /i under the action of F by simply replacing F by \[f('^)Fv]>('=). The sequence 

|(vI;WFvI/W)V}"_^ (51) 

will be called the minimal entropy approximation of level k of the exact orbit {F"'fi}'^^Q. Note 
that all terms of this sequence are mesures of minimal entropy, thus the entire approximate 
orbit lies in m^^j^{X). 
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Just like for the local structure approximation, the minimal entropy approximation ap- 
proximates the actual orbit increasingly well as k increases. In fact, we will prove that every 
point of the approximate orbit weakly converges to the corresponding point of the exact 
orbit. 



Proposition 4.1 Let k be a positive integer and b G A*. If k > |b| + 2r, then 

F/i([b]) = F^('V([b]) = ^('V/i([b]). (52) 

Proof. To prove it, note that yu([a]) = /i(^^([a]) for all blocks a of length up to k. The first 
equality of (52) can be written as 

J2 ^(a|b)^([a]) = Yl ^(a|b)/l('^)([a]). (53) 

The equality holds when |a| < k, that is, |b| + 2r < k. 

The second equality of (52) is a result of the fact that the \E'*^^^ operator only modifies 
probabilities of blocks of length greater than k. Since A; > |b| + 2r, we have |b| < A; and 
therefore F/i([b]) = ^WF/i([b]). □ 

Now let us note that can be viewed as a cellular automaton rule of radius nr, thus 
when > |b| + 2nr, we have FX[b]) = F"^W/i([b]). We can insert arbitrary number of 
■qtik) operators on the right hand side anywhere we want, and nothing will change, because 
does not modify relevant block probabilities. This yields an immediate corollary. 

Corollary 4.1 Let k and n be positive integers and h G A*. If k > |b| + 2nr, then 

FX[b]) = (^WF^('=))"/i([b]). 

This means that for a given n, measures of cylinder sets in the approximate measure 
(^WF^W)";^ coincide with measures of cylinder sets in for sufficiently large k. Be- 
cause cylinder sets constitute convergence determining class for measures, we obtain the 
following result. 

Theorem 4.1 Let F be a cellular automaton, /i G VJtcr{X) be a shift-invariant measure, 
and ut^ be a minimal entropy approximation of order k of the measure F^-ii, i.e., ut^ = 
(^WF^W)"^. Then for any positive integer n, v'it'' =^ F"'fi as k ^ oo. 



5. Minimal entropy maps 

Minimal entropy measures can be entirely described by specifying a finite number of block 
probabilities. We will use this feature to constructs a finite-dimensional map which approx- 
imates the action of a CA rule on shift-invariant measures. If zyW = then 

uit'' satisfies recurrence equation 

zyW^ = ^WF^WzyW. (54) 
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On both sides of this equation we have measures in 9Jl^g(X), and these are completely 
determined by probabilities of blocks of length k. If |b| = k, we obtain 

^Si([b]) = ^('=)FvI/Wz.W([b]), (55) 

and, since '^^'^^ does not modify probabilities of blocks of length k, this simplifies to 

^i?i([b]) = Fvl/W^W([b]). (56) 

By the definition of F, 



^i([b])= Ma|b) (vl/Wz.i'^)) ([a]). 



(57) 



To simplify the notation, let us define (5„(b) = z/i'^'' ( [b] ) , and, consistent with definition in 
eq. (46), Qn(a) = (^'^^''^Un^^ ([a])- Then we can rewrite the previous equation to take the 
form 

g„+i(b)= Yl «^(a|b)g„(a). (58) 

ag^|b|+2r 



arrange Qn(b) for al 



Note that by eq. (46), (5n(a) depends only on probabilities of blocks of length k. If we thus 

b G in lexicographical order to form a vector Q„, we will obtain 



Qn+l = U^'^ (Q„) , 



(59) 



where U^'^'^ : [0, 1]'"^' — )■ [0, l]'-^' has components defined by eq. (58). We will call this map 
an entropy minimizing map of order k. 



6. Example: elementary CA rule 26 

As an example, consider rule 26 given by 

w(l|000) = 0, w(l|001) = 1, w(l|010) = 0, u;(l|011) = 1, 

m;(1|100) = 1, w(l|101) = 0, w(l|110) = 0, w(l|lll) = 0, (60) 

and suppose we wish to construct minimal entropy map of order 2 for this rule. Let Pn{h) = 



-F"yu([b]). Using eq. (47) we obtain for r = 1, |6| = 3 



P„+i(b) = J2 ^^(a|b)P„(a). 



(61) 



Using definition of i(7(a|b) given in eq. (48) and transition probabilities given in eq. (60) we 
obtain 

P„+i(00) = P„(0000) + Pn(OlOl) + P„(1010) + P„(1101) + P„(1110) + P„(llll), 
P„+i(01) = P„(0001) + P„(0100) + P„(1011) + P„(1100), 
P„+i(10) = P„(0010) + P„(0110) + P„(0111) + P„(1000), 

P„+i(ll) = P„(0011) + P„(1001). (62) 
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This set of equations describes exact relationship between block probabilities at step n + 1 
and block probabilities at step n. Note that 3-block probabilities at step n + 1 are given in 
terms of 5-blocks probabilities at step n, thus it is not possible to iterate these equations. 



Minimal entropy map of order 2 (eq. 59) can be obtained by simply replacing P hy Q 



and placing the operator ^ over probabilities on the right hand side of eq. (62). This yields 



Qn+i(oo) = g„(oooo) + g„(oioi) + q„(ioio) + gn(iioi) + g„(iiio) + g„(iiii), 
g„+i(oi) = g„(oooi) + g„(oioo) + g„(ioii) + g„(iioo), 
g„+i(io) = g„(ooio) + g„(oiio) + gn(oiii) + gn(iooo), 

gn+l(ll) = gn(OOll) + 4(1001). (63) 



Using eq. (46) with k = 2, one can express gn(aia203fl4) in terms of 2-block probabilities. 
For example, 

g„(oooo) = T(g„(ooo),g„(ooo),g„(ooi)) = g„(ooo) 

= T(g,(oo),g,(oo),g„(oi)) = g,(oo) = g„(oo). (64) 



Similarly one obtains 

g,(oioi) = g„(oi), g„(ioio) = g„(io), g„(iiii) = g„(ii) 



(65) 



All other g™ (01020304) are equal to 0. This simplifies eq. (63) to 



gn+i(oo) = g„(oo) + g„(oi) + g„(io) + Qn{ii), 
g„+i(oi) = 0, 
g„+i(io) = 0, 
g„+i(ii) = 0. 



(66) 



This defines a minimal entropy map U^'^^ : [0,1]^ — )■ [0,1]^ (cf. eq. 59) which can be 
iterated, albeit in this case, it is a trivial map, which after one iteration reaches the fixed 
point (1,0,0,0), because gn(OO) + g„(01) + gn(lO) + g„(ll) = 1. We need higher order 
approximation in order to obtain a more "interesting" map. 

When k = 3, we follow the same procedure as for the k = 2 case discussed above. If 



we write eq. (58) for all possible b G A , we will have on the left hand sides eight block 
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probabilities (5(6i&2&3)j thus the resulting minimal entropy map will be 8-dimensional, 

g„,+i(ooo) = Q„,(ooooo) + g„,(oioio) + q„(ioioi) + q„(iioio) + q„(iiioi) 

+ 4(11110) + g„(iiiii), 
Q„+i(ooi) = Q„(ooooi) + gn(oioii) + g„(ioioo) + q„(iioii) + g„(iiioo), 
g„,+i(oio) = Q„(oooio) + g„(oiooo) + g„,(ioiio) + g„(ioiii) + g„(iiooo), 
g„+i(oii) = g„(oooii) + g„(oiooi) + gn(iiooi), 
g„+i(ioo) = g„(ooioi) + g„(oiioi) + gn(oiiio) + gn(oiiii) + g„(ioooo), 
g„,+i(ioi) = g„(ooioo) + g„(oiioo) + g„,(ioooi), 
g,+i(iio) = g„(ooiio) + g.(ooiii) + g.(iooio), 
g„+i(iii) = g„(iooii). (67) 

On the right hand side, we have 32 block probabilities which have to be expressed in terms 



of 3-block probabilities by using eq. (46) with k = 3. Some of these will simplify to a single 
3-block probability, e.g., 

g„(ooooo) = T(g„(oooo),g„(oooo),g„(oooi)) = g„(oooo) = g„(ooo). (68) 

Others, in general, will not simplify, and will have to be expressed by nested T functions, 
for example 

g„(ooioo) = T (t (g„(ooi), gn(oio), g„(oii)) , t (g„(oio), g„(ioo), g„(ioi)) , 

To,i (g„(oio), g„(ioo), g„(ioi))) . (69) 



Once we express all gn(fli02fl3fl405) in eq. ( |67[ ) by 3-block probabilities gn (0.10.20.3), we 
obtain a map [0, 1]^ — i- [0, 1]*. We omit explicit formulae for this map due to its complexity. 
One should stress, however, that only four components of this map are independent, and 
that by exploiting consistency conditions for block probabilities it is possible to reduce this 
map to [0, 1]^ — [0, 1]^. We refer interested reader to [5J, where we explained how to perform 
such reduction for local structure maps (the same method can used for minimal entropy 
maps) . 

Just for the sake of comparison, let us also write local structure map of order 3 for rule 26. 
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It can be obtained from eq. (67) by replacing Q with Q, 

g„+i(ooo) = gn(ooooo) + g„(oioio) + g„(ioioi) + g„(iioio) + g„(iiioi) 

+ g„(iiiio) + g„(iiiii), 
g„,+i(ooi) = g„,(ooooi) + g„,(oioii) + gn(ioioo) + g„(iioii) + g„(iiioo), 
g„+i(oio) = g„(oooio) + gn(oiooo) + gn(ioiio) + gn(ioiii) + gn(iiooo), 
g„+i(oii) = g„(oooii) + g„(oiooi) + g„(iiooi), 
g„+i(ioo) = g„(ooioi) + g„(oiioi) + g„(oiiio) + g„(oiiii) + g„(ioooo), 
g„+i(ioi) = g„(ooioo) + g„(oiioo) + g„(ioooi), 
g„+i(iio) = g„,(ooiio) + gn(ooiii) + g„(iooio), 
g„+i(iii) = g,(iooii), (70) 

where 

Both minimal entropy maps and local structure maps become rather complicated when 
k increases. Because of high dimensionality and strong nonlinearity, it is difficult to perform 
standard stability analysis for these maps. It is, however, rather straightforward to write a 
computer program which constructs and iterates them. 



7. Experimental results 

As we already mentioned, orbits of minimal entropy maps approximate orbits of measures 
under cellular automata rules. By iterating the minimal entropy map, we can obtain ap- 
proximate P„(a), that is, probability of occurrence of block a after n iterations of a given 
cellular automata rule. How good is this approximation, and it is any better than the local 
structure approximation? 

In order to shed some light on this question, we considered the following problem. Sup- 
pose that the initial measure is a Bernoulli measure /ip, so that 

/.,([a])=Po(a)=p^(l-p)l^l-^ (72) 

where j is the number of ones in a, |a| — j is the number of zeros in a, and p G [0, 1]. 
Probability of occurrence of a after n iterations is then given by 

P„(a) = (F>p)([a]). (73) 

The expected value of a given cell after n-th iteration of the rule, to be denoted p„, is given 
by 

p„ = l-P„(l) + 0-P„,(0) = P„(l). (74) 

We will call pn a density of ones at time n. Density can be estimated numerically by starting 
with an array of sites and setting each one of them independently to 1 or with probability 
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p OT 1 — respectively. We then iterate rule F n times (using periodic boundary conditions) 
and count how many cells are in state 1. The count divided by serves as a numerical 
estimate of 

One can also estimate p„ by iterating A;-th order minimal entropy map n times starting 



from initial conditions given by eq. (72), that is, Qo{ai) = -Po(ci). Then we compute Qn(l) 
by using consistency conditions, 

aG{0,l}'-— 1 

and Q„(l) is used as an approximation of p„, to be called k-th order minimal entropy 
approximation of p„. Analogous approximation using local structure map will be called k-th 
order local structure approximation of p„. 

An interesting question is now how p„ depends on po- Plot of p„ vs. po is called den- 
sity response curve. We plotted density response curves using "experimental" pn as well as 
using minimal entropy approximation and local structure approximation, both for orders 
k = 1, 2, ... 7. We found that, generally, as the order of the approximation increases, density 
response curves obtained by iterating minimal entropy maps become closer and closer to "ex- 
perimental curves". The same phenomenon is observed for density response curves obtained 
by iterating local structure maps. 

For most elementary rules, both local structure maps and minimal entropy maps produce 
good approximations of density curves. There are two exceptions, however, elementary CA 
rules 26 and 41. Here we will discuss rule 26 as an example. The experimental density 
response curve is shown as the continuous curve in Figure |3} Remarkably, density curves 
obtained by iterations of local structure maps up to order 7 are horizontal straight lines, as 
shown in Figure [3|^a). One can say, therefore, that the local structure fails to predict the 
correct shape of the density curve, at least for k < 7. 

In contrast to this, density curves obtained by iterations of minimal entropy maps, shown 
in Figure |3|^a), approximate the shape of the "experimental" density curve much better, even 
at order 3. The minimal entropy approximation, therefore, clearly outperforms the local 
structure approximation in this case. 



8. Conclusions 



We introduced the notion of the minimal entropy approximation of probability measures 
over binary bisequences. Minimal entropy approximation can be viewed as an opposite of 
Bayesian approximation, which maximizes entropy. We then demonstrated how the minimal 
entropy approximation can be used to construct approximations of orbits of measures under 
the action of deterministic or probabilistic cellular automata. Such approximate orbits can 
be fully characterized by orbits of finite-dimensional maps, which we call minimal entropy 
maps. While points of approximate orbits of measures obtained by iterating minimal entropy 
maps weakly converge to corresponding points of the exact orbits, just as in the case of 
approximate orbits of local structure theory, there are cases when the minimal entropy 
approximation works better than the local structure approximation. This is the case for 
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Figure 3: Density response curves for rule 26 for t = 10^ obtained by iteration of local 
structure maps (a) and minimal entropy maps (b). 
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elementary CA rule 26, for which the local structure theory fails in predicting the correct 
shape of the density response curve for k < 7. The minimal entropy approximation yields 
fairly accurate prediction for the density response curve of rule 26, starting with k = 3. 

An interesting question is why is the minimal entropy approximation better than the 
maximal entropy approximation in the case of rule 26? One could naively think that this is 
because the time evolution of rule 26 is somewhat more "ordered" than for other rules. It 
is, however, not true: there are other rules for which the spatiotemporal patters are even 
more "ordered" than for rule 26, yet both maximal and minimal entropy approximations 
seem to work for them equally well. In order to probe this issue further, one will need to 
find more examples of rules for which the minimal entropy approximation outperforms the 
local structure theory. A natural way to go beyond elementary CA rules considered here is 
to search for such examples among either probabilistic CA rules of radius 1, or deterministic 
CA rules of radius grater than 1. Both possibilities are currently investigated by the author. 
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